arXiv: 1506.05261 vl [cs.DC] 17 Jun 2015 


Dynamic Service Migration in Mobile Edge-Clouds 

Shiqiang Wang*, Rahul Urgaonkarl, Murtaza Zafer^^, Ting Hel, Kevin Chan§, and Kin K. Leung* 
*Imperial College London, United Kingdom, Email: {shiqiang.wangll, kin.leung} ©imperial.ac.uk 
llBM T. J. Watson Research Center, Yorktown Heights, NY, United States, Email: {rurgaon, the}@us.ibm.com 
iNyansa Inc., Palo Alto, CA, United States, Email: murtaza.zafer.us@ieee.org 
§Army Research Laboratory, Adelphi, MD, United States, Email: kevin.s.chan.civ@mail.mil 


Abstract —We study the dynamic service migration problem in 
mobile edge-clouds that host cloud-based services at the network 
edge. This offers the benefits of reduction in network overhead 
and latency but requires service migrations as user locations 
change over time. It is challenging to make these decisions in an 
optimal manner because of the uncertainty in node mobility as 
well as possible non-linearity of the migration and transmission 
costs. In this paper, we formulate a sequential decision making 
problem for service migration using the framework of Markov 
Decision Process (MDP). Our formulation captures general 
cost models and provides a mathematical framework to design 
optimal service migration policies. In order to overcome the 
complexity associated with computing the optimal policy, we 
approximate the underlying state space by the distance between 
the user and service locations. We show that the resulting MDP 
is exact for uniform one-dimensional mobility while it provides a 
close approximation for uniform two-dimensional mobility with 
a constant additive error term. We also propose a new algorithm 
and a numerical technique for computing the optimal solution 
which is significantly faster in computation than traditional 
methods based on value or policy iteration. We Illustrate the 
effectiveness of our approach by simulation nsing real-world 
mobility traces of taxis in San Francisco. 

Index Tenns —Cloud technologies, edge computing, Markov 
decision process (MDP), mobility, optimization, wireless networks 

I. Introduction 

Mobile applications that utilize cloud computing technolo¬ 
gies have become increasingly popular over the recent years, 
with examples including data streaming, real-time video pro¬ 
cessing, etc. Such applications generally consist of a front- 
end component running on the mobile device and a back-end 
component running on the cloud |T|, Q, where the cloud 
provides additional data processing and computational capa¬ 
bilities. With this architecture, it is possible to run complicated 
applications on handheld devices that have limited processing 
power. However, it also introduces new challenges to com¬ 
munication networks due to increased network overhead. The 
concept of mobile edge-cloud (MEC) has recently emerged as 
a promising technique to address these challenges. The core 
idea of MEC is to move computation closer to users, where 
small servers or data-centers that can host cloud applications 
are distributed across the network and connected directly to 
entities (such as cellular basestations) at the network edge, as 
shown in Eig.[T] This idea received notable commercial interest 
recently |^, and is expected to develop rapidly with the growth 
of new mobile applications and more advanced smartphones. 
MECs are also more robust than traditional centralized cloud 

^ Contributions of the author to this work are not related to his current 
employment at Nyansa Inc. 
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Figure 1. Application scenario of mobile edge-clouds (MECs). 

computing systems Q, because they are distributed and are 
thus less impacted by failures at a centralized point. The 
idea of distributing cloud servers at the network edge is also 
known as cloudlets Q, edge computing Q, fog computing 
Q, follow me cloud 0’ etc. In all these techniques, each 
server is responsible for a small geographical area, although 
some servers may not be directly connected to the basestation. 
In this paper, we focus on the case where MECs are colocated 
with the basestation, while noting that our proposed solution 
can be easily extended to the more general scenario. 

One of the new challenges that MEC brings in is dynamic 
service placement/migration. As a user moves across different 
geographical areas, should its service be migrated out of the 
original MEC that hosts the service? If so, where should it be 
migrated? There is a tradeoff between the service migration 
cost and the transmission cost (such as communication delay 
and network overhead) between the user and the MEC. It is 
challenging to find the optimal decision also because of the 
uncertainty in user mobility as well as possible non-linearity 
of the migration and transmission costs. 

Most existing work on service migration focuses on tradi¬ 
tional cloud environments without explicitly considering the 
impact of user mobility. However, user mobility becomes a 
key factor in MECs. The performance of MECs with the 
presence of user mobility is studied in |j^, but decisions on 
whether and where to migrate the service is not considered. A 
preliminary work on mobility-driven service migration based 
on Markov Decision Processes (MDPs) is given in |[^, which 
mainly considers one-dimensional (1-D) mobility patterns 
with a specifically defined cost function. Standard solution 
procedures are used to solve this MDP, which can be time- 
consuming especially when the MDP has a large number of 
states. Due to real-time dynamics, the cost functions and tran¬ 
sition probabilities of the MDP may change rapidly over time, 
thus it is desirable to solve the MDP in an effective manner. 
With this motivation, a more effective solution to the 1-D 
mobility case was proposed in pO) , where the transmission 
and migration costs are assumed to be constants whenever 
transmission/migration occurs. To the best of our knowledge. 









two-dimensional (2-D) mobility has not been considered in the 
literature, which is a much more realistic case compared to 
1-D mobility. The service migration problem is also different 
from handover policies in cellular networks, because users can 
connect to MECs that are located at remote basestations (see 
E) for more discussions). 

In this paper, we use the MDP framework to study service 
migration in MECs. We provide novel contributions compared 
to 0 and |T§, by considering general cost models, 2-D user 
mobility, and application to real-world traces. The details are 
summarized as follows. 

1) Our formulation captures general cost models and pro¬ 
vides a mathematical framework to design optimal service 
migration policies. We note that the resulting problem be¬ 
comes difficult to solve due to the large state space. In order 
to overcome this challenge, we propose an approximation 
of the underlying state space by defining the states as the 
distance between the user and the service location^ This 
approximation becomes exact for uniform 1-D mobilit}^ We 
prove several structural properties of the distance-based MDP, 
which includes a closed-form solution to the discounted sum 
cost. We leverage these properties to develop an algorithm for 
computing the optimal policy, which reduces the complexity 
from 0{N^) (by policy iteration |12 Section 6]) to 0{N^), 
where the number of states in the distance-based MDP is 
N + 1. 


2) We show how to use the distance-based MDP to ap¬ 
proximate the solution for 2-D mobility models, which allows 
us to efficiently compute a service migration policy for 2-D 
mobility. Eor uniform 2-D mobility, the approximation error 
is bounded by a constant. Simulation results comparing our 
approximation solution to the optimal solution (where the 
optimal solution is obtained from a 2-D MDP) suggest that 
it performs very close to optimal, and the proposed approxi¬ 
mation approach obtains the solution significantly faster. 

3) We demonstrate how to apply our algorithms in a 
practical scenario driven by real mobility traces of taxis in 
San Erancisco which consist of multiple users. We compare 
the proposed policy with several baseline strategies that in¬ 
clude myopic, never-migrate, and always-migrate policies. It 
is shown that the proposed approach offers significant gains 
over those baseline approaches. 


II. Problem Eormulation 

Consider a mobile user that accesses a cloud-based service 
hosted on the MECs. The set of possible locations is given 
by C, where C is assumed to be finite (but arbitrarily large). 
We consider a time-slotted model where the user’s location 
remains fixed for the duration of one slot and changes from one 
slot to the next according to a Markovian mobility model. The 
time-slotted model can be regarded as a sampled version of 
a continuous-time model, and the sampling can be performed 
either at equal intervals over time or occur right after a cellular 
handover instance. In addition, we assume that each location 

’Throughout this paper, we mean by user location the location of the 
basestation that the user is associated to. 

^The 1-D mobility is an important practical scenario often encountered in 
transportation networks, such as vehicles moving along a road. 
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Figure 2. Timing of the proposed service migration mechanism. 


Z G £ is associated with an MEC that can host the service for 
the user. The locations in C are represented as 2-D vectors 
and there exists a distance metric ||£ — ( 2 II that can be used to 
calculate the distance between locations li and ^ 2 - Note that the 
distance metric may not be Euclidean distance. An example of 
this model is a cellular network in which the user’s location is 
taken as the location of its current basestation and the MECs 
are co-located with the basestations. As shown in Section [TVl 
these locations can be represented as 2-D vectors {i,j) with 
respect to a reference location (represented by (0,0)) and the 
distance between any two locations can be calculated in terms 
of the number of hops to reach from one cell to another cell. 
We denote the user and service locations at timeslot t as u{t) 
and h{t) respectively. 

Remark: Although we formulate the problem for the case 
of a single user accessing a single service, our solution can be 
applied to manage services for multiple users, as long as each 
user accesses a separate copy of the service. We will illustrate 
such an application in Section [V] 

The main notations in this paper are summarized in GD- 

A. Control Decisions and Costs 

At the beginning of each slot, the MEC controller can 
choose from one of the following control options; 

1) Migrate the service from location h{t) to some other 
location h'{t) G C. This incurs a cost Cm(x) that is 
assumed to be a non-decreasing function of x, where 
X is the distance between h{t) and h!{t), i.e., x = 
\\h{t) — h'{t)W. Once the migration is completed, the 
system operates under state h'{t)). We assume that 
the time to perform migration is negligible compared to 
the time-scale of node mobility (as shown in Eig. |^. 

2) Do not migrate the service. In this case, we have h'{t) = 
h{t) and the migration cost is Cm(0) = 0. 

In addition to the migration cost, there is a transmission cost 
incurred by the user for connecting to the currently active ser¬ 
vice instance. The transmission cost is related to the distance 
between the service and the user after possible migration, and 
it is defined as a general non-decreasing function Cd(x), where 
X = \\u{t) — h'{t)\\. We set Cd(0) = 0. 

B. Performance Objective 

Let us denote the overall system state at the beginning 
of each timeslot (before possible migration) by s{t) = 
{u{t), h{t)). The state s{t) is named as the initial state of 
slot t. Consider any policy tt that makes control decisions 
based on the state s{t) of the system, and we use a,n.(s(t)) 
to represent the control action taken when the system is in 
state s{t). This action causes the system to transition to a new 



















intermediate state s'{t) = {u(t),h'{t)) = aTr{s{t)). We also 
use Ca^ {s{t)) to denote the sum of migration and transmission 
costs incurred by a control aTr{s{t)) in slot t, and we have 
Ca„{s{t)) = Cm.l\\h{t)-h'{t)\\) + Cdi\\u{t)-h'{t)\\). Starting 
from any initial state s( 0 ) = the long-term expected 
discounted sum cost incurred by policy tt is given by 


K'(so) = lim E 

t—>-oo 


Xl7"'C'a,(s(T)) 


V r—0 


s(0) = so } (1) 


where 0 < 7 < 1 is a discount factor. 

Our objective is to design a control policy that minimizes 
the long-term expected discounted sum total cost starting from 
any initial state, i.e.. 


F*(so) = minK(so) Vsq. ( 2 ) 

TT 

This problem falls within the class of MDPs with infinite 
horizon discounted cost. It is well known that the optimal 
solution is given by a stationary policy and can be obtained 
as the unique solution to the Bellman’s equation: 

l/*(so) = mm|c'a(So) +7 ^ ^’a(so).si^*(si)} (3) 
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Figure 3. An example of distance-based MDP with the distances {rf(i)} 
(before possible migration) as states. In this example, migration is only 
performed at state N, and only the possible action of a{N) = 1 is shown 
for compactness. The solid lines denote state transitions without migration. 

it is reasonable to use the distance as an approximation of 
the state space for many practical scenarios of interest, and 
this simplification allows us to formulate a far more tractable 
MDP. We discuss the distance-based MDP in the next section, 
and show how the results on the distance-based MDP can be 
applied to 2-D mobility models and real-world mobility traces 
in Sections |IV] and |V] 

In the remainder of this paper, where there is no ambiguity, 
we reuse the notations P, Cq(-), V{-), and a(-) to respec¬ 
tively represent transition probabilities, one-timeslot costs, 
discounted sum costs, and actions of different MDPs. 


where Pa(so),si denotes the probability of transitioning from 
state s'(0) = Sg = a(so) to s(l) = Si. Note that the 
intermediate state s'{t) has no randomness when s{t) and a(-) 
are given, thus we only consider the transition probability from 
s'(t) to the next state s(f-|-l) in ([^. Also note that we always 
have h{t -b 1 ) = h'{t). 

C. Characteristics of Optimal Policy 

We next characterize some structural properties of the 
optimal solution. The following lemma states that it is not 
optimal to migrate the service to a location that is farther 
away from the user, as one would intuitively expect. 

Lemma 1. Let a*{s) = {u,h') denote the optimal action at 
any state s = (u, h). Then, we have ||u — h'\\ < |ju — h\\. (If 
the optimal action is not unique, then there exists at least one 
such optimal action.) 

Corollary 1. If Cmix) and Cd{x) are both constants (possibly 
of different values) for x > Q, and Cm( 0 ) < Cm(x) and 
C(i( 0 ) < Cd(,x) for x > 0, then migrating to locations other 
than the current location of the mobile user is not optimal. 

See ED for the proofs of Lemma and Corollary [T] 

D. Simplifying the Search Space 

Lemma [T] simplihes the search space for the optimal policy 
considerably. However, it is still very challenging to derive 
the optimal control policy for the general model presented 
above, particularly when the state space {s(f)} is large. One 
possible approach to address this challenge is to re-define the 
state space to represent only the distance between the user 
and service locations d{f) = ||u(f) — h(f)||. The motivation 
for this comes from the observation that the cost functions in 
our model depend only on the distance. Note that in general, 
the optimal control actions can be different for two states sg 
and Si that have the same user-service distance. However, 


HI. Optimal Policy for Distance-Based MDP 

In this section, we consider a distance-basec0 MDP where 
the states {d(<)} represent the distances between the user and 
the service before possible migration (an example is shown in 
Fig. i.e., d{t) = |ju(f) — h(t)\\. We define the parameter 
N as an application-specihc maximum allowed distance, and 
we always perform migration when d{t) > N. We set the 
actions a{d{t)) = a{N) for d{t) > N, so that we only need to 
focus on the states d(t) G [0, A^]. After taking action a{d{t)), 
the system operates in the intermediate state d'{f) = a{d(t)), 
and the value of the next state d{t + 1 ) follows the transition 
probability Pd'(t),d(t+i) which is related to the mobility model 
of the user. To simplify the solution, we restrict the transition 
probabilities Pd'(t),d(t+i) according to the parameters po, p, 
and q as shown in Fig. Such a restriction is sufficient when 
the underlying mobility model is uniform 1-D random walk 
where the user moves one step to the left or right with equal 
probability ri and stays in the same location with probability 
1 — 2 ri, in which case we can set p = q = ri and pg = 2 ri. 
It is also sufficient to approximate the uniform 2-D random 
walk model, as will be discussed in Section |IV-B| 

For an action of d'(t) = a(d(t)), the new service location 
h'(t) is chosen such that \\h(t) — h'{t)\\ = \d(t) — d'{t)\ and 
||u(f) — h'{t)\\ = dft). This means that migration happens 
along the shortest path that connects u{t) and h{t), and h'{t) 
is on this shortest path (also note that d'{t) < d{t) according 
to Lemma [^ 1 . Such a migration is possible for the 1-D case 
where u{t), h{f), and h'{f) are all scalar values. It is also 
possible for the 2-D case if the distance metric is properly 
defined (see Section |IV-B| for details). The one-timeslot cost 
is then Ca{d{t)) = Cmi\d(t) — d'(t)\)+Cd{d'(t)). We define the 
cost functions Cm{x) and Cd{x) in a constant-plus-exponential 

^We a.ssume that the distance is quantized, as it will be the case with the 
2-D model discussed in later sections. 
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Figure 4. Example of exponential cost function C(i(x). 

form (shown in Q and (|^ below). Such a cost function can 
have different shapes and are thus applicable to many realistic 
scenarios. It also has nice properties allowing obtain a closed- 
form solution to the discounted sum cost, based on which we 
design an efficient algorithm for finding the optimal policy. 

A. Constant-Plus-Exponential Cost Functions 

The constant-plus-exponential cost functions are defined as 



if x = 0 

if a: > 0 


if a; = 0 

if a; > 0 


(4) 

(5) 


where (3^, Pi, ^c, ^i, P, and 9 are real-valued parameters. Note 
that X = \d{t) — d'{t)\ in Cm{x), and x = d'{t) in Cd{x). The 
values of these parameters are selected such that Cm,{x) > 
0 , Cd{x) > 0 , and both Cm{x) and Cd{x) are non-decreasing 

in X for x > Q. Explicitly, we have p, > O', Pi < 0 when 

P < 3', Pi > 0 when p > I', Pc > —Pi', d > O', 5i < 0 

when 0 < 1 ; ( 5 ; > 0 when 0 > 1 ; and 5c > —< 5 /. We set 

Cm{0) = Cd( 0 ) = 0 for convenience, because a non-zero cost 
for cc = 0 can be offset by the values of Pc and 5c, thus setting 
Cm( 0 ) = Cd( 0 ) = 0 does not affect the optimal decision. 

With this definition, the values of Pc -f Pi and 5c + 5i can 
be regarded as constant terms of the costs, at least such an 
amount of cost is incurred when x > 0. The parameters p 
and 9 specify the impact of the distance x to the costs, and 
their values can be related to the network topology and routing 
mechanism of the network. The parameters Pi and 5i further 
adjust the costs proportionally. 

An example of the cost function is shown in Fig. This 
exponential cost function can be used to approximate an 
arbitrary cost function as discussed in GD- 


B. Closed-Form Solution to the Discounted Sum Cost 


1) Problem Formulation with Difference Equations: From 
we get the following balance equation on the discounted 
sum cost for a given policy tt: 


14(d(0)) 


a„(d(0))-|-l 

C'aJd(0))+7 Y. Pa„{d{0)),d(l)VT,{d{l)). 

d(l)=a„(d(0))-l 


( 6 ) 


In the following, we will omit the subscript tt and write 
(i( 0 ) as d for short. 

Proposition 1. For a given policy tt, let {uk : fc > 0} denote 
the series of migration states (i.e. a{nk) nk) as specified by 


policy TT, where 0 < < N. The discounted sum cost V(d) 

for policy tt can be expressed as 

H-9<^ ifl-^-P20f0 

Hd-9‘^ ifl-^-<j)20 = 0 

(7) 

where the coefficients mi, m 2 , D, and H are expressed as 
follows: 


V{d)= Akmf-^ Bkm2-^ D + , 


l-\- ffl — 40 i (/)2 1 — y/1 — 4pi4>2 

mi = -—-,TO 2 = - 


2(f>2 

D = 


2(f>2 


p3 


1 — 01 — 02 


( 8 ) 

(9) 


H 


1 - 


<l>4 

04 


^-020 




^ - 020 ^ 0 
^ - 020 = 0 


( 10 ) 


where we define pi = 02 = <^3 = 

l-lC-p-q)’ ~ T-lC-p-q)' 

The constants and Bk are related to the interval [0, no] 
(k = 0 ) or [nfc_i,nfe] (k > 0 ). 


Proof The proof is based on solving a difference equation 
according to (j^, see 0 for details. □ 


We also note that for two different states di and c? 2 , if the 
policy TT has actions aTr{di) = c ?2 and a^(d 2 ) = d 2 , then 

F^(di) = Cm (Ml - 02]) + I4 (c(2)- (11) 

2 ) Finding the Coefficients: The coefficients Ak and Bk are 
unknowns in the solution Q that need to be evaluated using 
additional constraints. These coefficients may be different for 
states within different intervals of [nk-i,nk]- After these 
coefficients are determined, 0 holds for all d G [nk-i,nk]- 
We assume 1 — ^ — 020 M 0 and 1 — ^ — pi9 0 from 
now on, the other cases can be derived in a similar way. 

Coefficients for 0 < d < uq: One constraint is from the 
balance equation (j^ for d = 0, which is 

V{0) = ypoVil)+jil-po)V{0). (12) 

By substituting Q into 0 , we get 

24o( 1 - 0oTOi) + Bo{l - 0 OTO 2 ) =79(00 - 1) + 77(000 - 1) 

(13) 

where 0o = i_Mi-pn T- Another constraint is obtained by 
substituting 0 into ([TT]i, which gives 

Aq (jni° — -I- Bq ^7722° — 

= Pc + - H ( 9 ^° - (14) 

The values of Aq and Bq can be solved from ( [T3] l and ( [T^ . 

Coefficients for Uk-i < d < Uk-' Assume that we have 
found V(d) for all d < Uk-i- By letting d = Uk-i in (j^, we 
have the first constraint given by 

Akm^ff-^ + Bkml>‘-^ = V{nk-i) -D-H- 0"'=-^. (15) 

For the second constraint, we consider two cases. If a{nk) < 





















Uk-i, then 


Akm^'‘ + Bkrn^’" 

= /3, + + V{a{nk)) -D-H-O^K (16) 

If rik-i < ci(rik) < nfe — 1, then 

Ak I ? 7 ij —m^ \ + Dk I TO2 —m2 I 

= j3c + - H ( 0 "'“ - . (17) 

The values of Ak and Bk can be solved from together 
with either ( [T 6 ] l or ©■ 

3) Solution is in Closed-Form: We note that Aq and Bq 
can be expressed in closed-form, and Ak and Bk for all k 
can also be expressed in closed-form by substituting Q into 
CD and ( [T 6 | ) where necessary. Therefore, Q is a closed-form 
solution for all d G [0, N], Numerically, we can find V{d) for 
all d G [0, N] in 0{N) time. 

C. Algorithm for Finding the Optimal Policy 

Note that standard approaches of solving for the optimal 
policy of an MDP include value iteration and policy iteration 
m Section 6 ]. Value iteration hnds the optimal policy from 
the Bellman’s equation Q iteratively, which may require a 
large number of iterations before converging to the optimal 
result. Policy iteration generally requires a smaller number of 
iterations, because, in each iteration, it finds the exact values 
of the discounted sum cost V(d) for the policy resulting from 
the previous iteration, and performs the iteration based on the 
exact V(d) values. However, in general, the exact V(d) values 
are found by solving a system of linear equations, which has 
a complexity of 0{N^) when using Gaussian-elimination. 

We propose a modified policy-iteration approach for finding 
the optimal policy, which uses the above result instead of 
Gaussian-elimination to compute V{d), and also only checks 
for migrating to lower states or not migrating (according to 
Lem ma [T]) . The algorithm is shown in Algorithm [T] where 
Lines |4J^ find the values of Uk, Lines [8]-[T7l find the dis¬ 
counted sum cost values, and Lines [T 8 |j 20 | update the optimal 
policy. The overall complexity for each iteration is O {N^) 
in Algorithm which reduces complexity because standarcjj 
policy iteration has complexity 0 {N^), and the standard value 
iteration approach does not compute the exact value function 
in each iteration and generally has long convergence time. 

IV. Approximate Solution for 2-D Mobility Model 
In this section, we show that the distance-based MDP can 
be used to find a near-optimal service migration policy, where 
the user conforms to a uniform 2-D random walk mobility 
model on infinite space. This mobility model can be used 
to approximate real-world mobility traces (see Section 0 . 
We consider a hexagon cell structure, but the approximation 
procedure can also be used for other 2-D mobility models 
(such as Manhattan grid) with some parameter changes. The 
user is assumed to transition to one of its six neighboring cells 
at the beginning of each timeslot with probability r, and stay 
in the same cell with probability 1 — 6 r. 

^We use the term “standard” here to distinguish with the modified policy 
iteration mechanism proposed in Algorithm [T| 


Algorithm 1 Modified policy-iteration algorithm based on 
difference equations 

1: Initialize a(d) = 0 for all d = 0,1, 2,..., N 

2: Find constants fo, fi, cj) 2 , fs, 4>i, mi, m 2 , D, and iJ 

3: repeat 

4: fc 3- 0 

5: for d = I...N do 

6: if a{d) f d then 

7: nk ^— d, k ^ “t“ 1 

8: for all Uk do 

9: if fc = 0 then 

10: Solve for Aq and Bq from and d 

11: Find V{d) with 0 < d < Uk from (j^ with Aq and 

Bq found above 
12 : else if /c > 0 then 

13: if a{nk) < fik-i then 

14: Solve for Ak and Bk from ( [T5] l and CD 

15: else 

16: Solve for Ak and Bk from ( [T5] l and ([l7] i 

17: Find V{d) with nk-i < d < Uk from Q with Ak 

and Bk found above 
18: for d = 1...N do 

19. Up];ev(d) — 

20: a{d) = &rgTimia<d^Ca{d) + 

21: until aprev(d) = a(d) for all d 
22 : return a(d) for all d 


A. Offset-Based MDP 

Define the ojfset of the user from the service as a 2-D vector 
e{f) = u(t) — h(f) (recall that u{f) and h{f) are also 2-D 
vectors). Due to the space-homogeneity of the mobility model, 
it is sufficient to model the state of the MDP by e{f) rather than 
sit). The distance metric ||(i — ( 2 II is defined as the minimum 
number of hops that are needed to reach from cell li to cell 
I 2 on the hexagon model. 

We name the states with the same value of ||e(f)|| as a ring, 
and express the states {e(f)} with polar indexes (i,j), where 
the first index i refers to the ring index, and the second index j 
refers to each of the states within the ring, as shown in Fig. 
For e(f) = ii,j), we have ||e(f)|| = i. If uit) = h{t) (i.e., the 
actual user and service locations (cells) are the same), then we 
have e(f) = (0,0) and ||e(f)|| = 0. 

Similarly as in the distance-based MDP, we assume in the 
2-D MDP that we always migrate when ||e(f)|| > N, where 
W is a design parameter, and we only consider the state 
space {e{t)} with ||e(f)|| < N. The system operates in the 
intermediate state e'(f) = u{f) — h'{t) = a(e(f)) after taking 
action a(e(f)). The next state e(f -I- 1) is determined proba¬ 
bilistically according to the transition probability Pei(t),e(t+i)- 
We have Pe'(t).e(t-i-i) = 1 — 6 r when e{t -I- 1) = e'(f); 
Pe'(t),e(t-ii) = T when e(f -|- 1 ) is a neighbor of e'(f); 
and Pe'(t),e{t-\-i) = 0 otherwise. Note that we always have 
e(f) — e'(f) = h'{f) — hit), so the one-timeslot cost is 
C,(e(f))=c„(||e(f)-e'(f)||) + c,(||e'(f)||). 

We note that, even after simplihcation with the offset model, 
the 2-D offset-based MDP has a significantly larger number of 









Figure 5. Example of 2-D offset model on hexagon cells, where N = 3. 

states compared with the distance-based MDP, because for a 
distance-based model with N states (excluding state zero), the 
2-D offset model has M = + 3-/V states (excluding state 

(0, 0)). Therefore, we use the distance-based MDP proposed 
in Section to approximate the 2-D offset-based MDP, 
which significantly reduces the computational time as shown 
in Section HV-DI 

B. Approximation by Distance-based MDP 

In the approximation, the parameters of the distance-based 
MDP are chosen as po = (>r,p = 2.5r, and q = 1.5r. The intu¬ 
ition of the parameter choice is that, at state (ip, Jq) = 0 ) in 

the 2-D MDP, the aggregate probability of transitioning to any 
state in ring = 1 is 6 r, so we set po = 6 r; at any other state 
(*o>Jo) (OjO)’ the aggregate probability of transitioning to 
any state in the higher ring zi = Zg + 1 is either 2r or 3r, and 
the aggregate probability of transitioning to any state in the 
lower ring zi = Zg — 1 is either r or 2 r, so we set p and q to 
the median value of these transition probabilities. 

To find the optimal policy for the 2-D MDP, we first 
find the optimal policy for the distance-based MDP with the 
parameters defined above. Then, we map the optimal policy 
from the distance-based MDP to a policy for the 2-D MDP. To 
explain this mapping, we note that, in the 2-D hexagon offset 
model, there always exists at least one shortest path from any 
state (z, j) to ring z', the length of this shortest path is |z — z'|, 
and each ring between i and z' is traversed once on the shortest 
path. For example, one shortest path from state (3, 2) to ring 
z' = 1 is (3, 2), (2,1), (1, 0). When the system is in state 
(z,j) and the optimal action from the distance-based MDP 
is a*{i) = i', we perform migration on the shortest path from 
(z, j) to ring z'. If there exist multiple shortest paths, one path 
is arbitrarily chosen. For example, if a(3) = 2 in the distance- 
based MDP, then either a(3, 2) = (2,1) or a(3, 2) = (2, 2) 
in the 2-D MDP. With this mapping, the one-timeslot cost 
Ca{d(t)) for the distance-based MDP and the one-timeslot 
cost Ca{e{t)) for the 2-D MDP are the same. 

C. Bound on Approximation Error 

Error in the approximation arises because the transition 
probabilities in the distance-based MDP are not exactly the 
same as that in the 2-D MDP (there is at most a difference 
of 0.5r). In this subsection, we study the difference in the 
discounted sum costs when using the policy obtained from 
the distance-based MDP and the true optimal policy for the 
2-D MDP. The result is summarized as Proposition 


Proposition 2. Let V^,s((e) denote the discounted sum cost 
when using the policy from the distance-based MDP, and let 
V* (e) denote the discounted sum cost when using true optimal 
policy of the 2-D MDP, then we have Vdist{e) — V*(e) < 
for all e, where k = max^, {cm (a; -f 2) — Cm (a;)}. 

Proof (Outline) The proof is completed in three steps. First, 
we modify the states of the 2-D MDP in such a way that the 
aggregate transition probability from any state (zg,_)g) 7 ^ ( 0 , 0 ) 
to ring zi = Zg -f 1 (correspondingly, zi = Zg — 1) is 2.5r 
(correspondingly, 1.5r). We assume that we use a given policy 
on both the original and modified 2-D MDPs, and show a 
bound on the difference in the discounted sum costs for these 
two MDPs. In the second step, we show that the modified 
2-D MDP is equivalent to the distance-based MDP. This can be 
intuitively explained by the reason that the modified 2-D MDP 
has the same transition probabilities as the distance-based 
MDP when only considering the ring index z, and also, the 
one-timeslot cost Ca{e{t)) only depends on ||e(f) — a(e(f))|| 
and ||a(e(f))||, both of which can be determined from the ring 
indexes of e{t) and a{e{t)). The third step uses the fact that the 
optimal policy for the distance-based MDP cannot bring higher 
discounted sum cost for the distance-based MDP (and hence 
the modified 2-D MDP) than any other policy. By utilizing the 
error bound found in the first step twice, we prove the result. 
For details of the proof, see | [TT| . □ 

The error bound is a constant value when the parameters 
are given. It increases with 7. However, note that the absolute 
value of the discounted sum cost also increases with 7, so the 
relative error can remain low. 


D. Numerical Evaluation 


The error bound derived in Section IV-C is a worst-case up¬ 
per bound of the error. In this subsection, we evaluate the per¬ 
formance of the proposed approximation method numerically, 
and focus on the average performance of the approximation. 

We consider 2-D random walk mobility with randomly 
chosen parameter r. The maximum user-service distance is 
set as N = 10. The transmission cost function parameters 
are selected as 6 = 0.8, 5c = 1, and Si = —1. With these 
parameters, we have 5c-\-Si = 0, which means that there is no 
constant portion in the cost function. For the migration cost, 
we choose p, = 0.8 and fix (3^ + /3; = 1 to represent a constant 
server processing cost for migration. The parameter /3i < 0 
takes different values in the simulations, to represent different 
sizes of data to be migrated. 

The simulations are performed in MATLAB on a computer 
with Intel Core 17-2600 CPU, 8GB memory, and 64-bit Win¬ 
dows 7. We study the computation time (i.e., the time used to 
run the algorithm) and the discounted sum cost of the proposed 
approach that is based on approximating the original 2-D 
MDP with the distance-based MDP. For the computation time 
comparison, standard value and policy iteration approaches 
i fT^ Section 6] are used to solve the original 2-D MDP. The 
discounted sum cost from the proposed approach is compared 
with the costs from alternative policies, including the true 
optimal policy from standard policy iteration on the 2-D 
model, the never-migrate policy which never migrates except 
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Figure 6 . Simulation result with 2-D random walk: (a) 7 = 0.5, (b) 7 = 0.9, (c) 7 = 0.99. 


when at states in the outer ring i = N, the always-migrate 
policy which always migrates when the user and the service 
are at different locations, and the myopic policy that chooses 
actions to minimize the one-timeslot cost. 

The simulations are run with 50 different random seeds, and 
the overall results are shown in Fig. [^with different values of 
the discount factor 7. We can see that the proposed method 
brings discounted sum costs that are very close to the optimum. 
Meanwhile, the computation time when using the proposed 
method is only about 0.1% of the computation time of standard 
value and policy iteration approaches. For further discussion 
on the simulation results, see O- 

V. Application to Real-World Scenarios 

In this section, we discuss how the aforementioned ap¬ 
proaches can be applied to service migration in the real world, 
where multiple users and services are involved. 


A. Operating Procedure 

In the real-world scenario, we assume that each user follows 
a sample path of the 2-D hexagon mobility model. The pa¬ 
rameter r is estimated from the sample paths of multiple users 
(discussed in details in Section V-A2 1 . The cost parameters 7. 
Pi, gL, 6c, Si, and 6 are selected based on the actual application 
scenario, and their values vary with the background traffic load 
of the network and MFC servers. By appropriately adjusting 
the cost functions according to the load, the proposed approach 
can be used for load balancing; we will demonstrate how to 
achieve load balancing by a proper selection of cost parameters 
in Section |V-B[ The discount factor 7 is selected based on the 
duration of services, and a larger 7 is selected for a longer 
service duration. Such a selection is because the discount 
factor 7 determines the amount of time to look ahead, if a 
user only requires the service for a short time, then there is 
no need to consider the cost for the long-term future. 

1) Initial Service Placement: Upon initialization, the ser¬ 
vice is placed onto the MFC that is connected to the same 
basestation as the user is currently connected to. In other 
words, when a service starts, the user and the service are in 
the same location, i.e., state (0,0) of the 2-D MDP. This is 
due to the consideration that the initialization cost and the cost 


of further operation can be minimized by initially placing the 
service closest to the user. 

2) Dynamic Service Migration: After the initial service 
placement, the subsequent actions of possible service migra¬ 
tion is performed according to the optimal policy found from 
the mechanisms proposed earlier in this paper. 

To facilitate the mapping between the real-world and the 
MDP model, we define a timeslot-length T which is the same 
for all MFCs. The parameter T can be regarded as a protocol 
parameter, and different MFCs do not need to synchronize 
on the timeslots. We also define a window length Tyj > T, 
which specifies the amount of time to look back to estimate 
the parameter r. We consider the case where the parameter 
r is the same across the whole geographical area, which is 
a reasonable assumption when different locations within the 
geographical area under consideration have similarities (for 
example, they all belong to an urban area). More sophisticated 
cases can be studied in the future. The optimal policy is 
computed by a MFC controller, and a policy update interval 
Tu is defined. 

The operating procedure is described as follows. Fach MFC 
obtains the identity of associated users of the cell connected 
to the MFC, at the beginning of each timeslot with length 
T. Based on this information, the MFC computes the number 
of users that have left the cell and the total number of users 
in the cell, and stores this information for each timeslot. At 
interval T„, the MFC controller sends a request to all MFCs 
to collect the current statistics. After receiving the request, 
each MFC ^mec computes the empirical probability of 
users moving outside of the cell, based on the statistics on 
the departed and total users within the duration of T^,. These 
empirical probabilities are sent together with other monitored 
information (such as current network and server load) to the 
controller. The controller then computes the average of the em¬ 
pirical probabilities /iyEc. denoted by /, and updates parameter 
r = //6. It also computes the transmission and migration cost 
parameters based on the current system condition. Based on 
these updated parameters, the controller computes the optimal 
policy under the current system condition and sends the result 
to the MFCs. 
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Figure 7. Simulation result with real-world traces: (a) average cost per user in each timeslot over a day, where Rt = Rp = 1.5; (b)-(e) average cost reduction 
compared with alternative policies (error bars denote the standard deviation); (f) average cost. 


B. Trace-Driven Simulation 

We perform simulation with real-world mobility traces of 
536 taxis in San Francisco, collected on the day of May 31, 
2008 ITT) , fT4) . A hexagon cell structure with 500 m cell 
separation is assumed and each taxi is assumed to be connected 
to the nearest basestation. The parameters for data collection 
and policy update are set as T = = 60 s, and = 3600 s. 

We choose N = 10, ^ = 6 = 0.8, and 7 = 0.9. 

1) Cost definition: It is assumed that the network load is 
proportional to the number of taxis in operation, and we denote 
the normalized total amount of transmission bandwidth re¬ 
source (correspondingly, processing resource at MEC servers) 
with a factor Rt (correspondingly, Rp). Then, we define a 
quantity Gt (correspondingly, Gp) as G* = ~ r^'. ) 

(correspondingly, Gp = 1 ^ (^l — ^ where mcur de¬ 

notes the number of taxis in operation at the time when 
the optimal policy is being computed, and mmax denotes the 
maximum number of taxis that may simultaneously operate at 
any time instant in the considered dataset. The cost parameters 
are then defined as fic = Gp Gt, fii = —Gt, Sc = Gt, and 
Si — —Gt- With such a definition, we have fic + fii = Gp 
and Sc -i- Si = 0 , which means that the constant part of the 
migration cost is Gp (to represent the processing cost for 
migration) and there is no constant part in the cost for data 
transmission. We set fii = Si = Gt because this part of cost 
can be regarded as related to data transmission in both Cm(x) 
and Cd(x). The choice of Gt and Gp can serve for the purpose 
of load balancing ng and can also represent the delay of data 
transmission (Gt) and processing (Gp). 


2) Results: The average cost per user in each timeslot (i.e., 
the Ga{s(t)) values) is collected and shown in Fig. Denote 
the cost of the proposed method as G and the cost of the 
method under comparison as Go, then the cost reduction is 
defined as (Go — G)/Go. The results show that the proposed 
approach is beneficial with cost reductions ranging from 9% 
to 54% compared with the never/always migrate or myopic 
policy. The results also show that the costs fluctuate (due 
to different system load) over the day, and they vary with 
different amount of total available resources, which implies 
that it is necessary to compute the optimal policy in real-time, 
based on recent observations on the system condition. 

VI. Discussions 

We have made some assumptions to make the problem theo¬ 
retically tractable. In this section, we justify these assumptions 
from a practical point of view. 

Cost Functions: To ease our discussion, we have limited our 
attention to transmission and migration costs in this paper. This 
can be extended to include more sophisticated cost models. 
For example, the transmission cost can be extended to include 
the computational cost of hosting the service at an MEC, by 
adding a constant value to the transmission cost expression. 
As in Section V-B the cost values can also be time-varying 
and related to the background system load. Furthermore, the 
cost definition can be practically regarded as the average cost 
over all locations, which means that when seen from a single 
location, the monotonicity of cost values with distances does 
not need to apply. This makes the proposed approach less 
restrictive in terms of practical applicability. 

































































































We also note that it is generally possible to formulate an 
MDP with additional dimensions in cost modeling, such as 
one that includes the state of the network, load at each specific 
MEC hosting the service, state of the service to avoid service 
interruption when in critical state, etc. However, this requires a 
significantly larger state space compared to our formulation in 
this paper, as we need to include those network/MEC/service 
states in the state space of the MDP. There is a tradeoff 
between the complexity of solving the problem and accuracy 
of cost modeling. Such issues can be studied in the future. 

Single/Multiple Users: As pointed out in Section al¬ 
though we focused on a single user in our problem modeling, 
practical cases involving multiple users running independent 
services can be considered by setting cost functions related 
to the background traffic generated by other users, as in 
Section V-B Eor more complicated cases such as multiple 
users sharing the same service, or where the placement of 
different services is strongly coupled and reflected in the cost 
value, we can formulate the problem as an MDP with larger 
state space. The details are left for future work where we 
envision similar approximation techniques as in this paper can 
be used to approximately solve the resulting MDP. 

Transition Probability in MDP: In the theoretical modeling, 
the transition probabilities in the MDP are assumed to be 
known. In practice, these can be estimated from the cell 
association history of users, as discussed in Section |V-A| 

Uniform Random Walk: The uniform random walk mobility 
model is used as a modeling assumption, which not only 
simplifies the theoretical analysis, but also makes the practical 
implementation of the proposed method fairly simple in the 
sense that only the empirical probability of users moving 
outside of the cell needs to be recorded (see Section |V-A2| l. 
This model can capture the average mobility of a large number 
of users. The simulation results in Section IV-BI confirm that 
this model provides good performance, even though individual 
users do not necessarily follow a uniform random walk. 

MEC Controller: The MEC controller does not need to be 
a separate cloud entity. Rather, it can be a service running at 
one of the MECs. The additional overhead incurred by the 
proposed approach is low, because it has low computational 
complexity and the interactions between each MEC and the 
MEC controller is infrequent (the interval is specified by T„). 


VII. Conclusions 

In this paper, we have studied service migration in MECs. 
The problem is formulated as an MDP, but its state space can 
be arbitrarily large. To make the problem tractable, we have 
reduced the general problem into an MDP that only considers 
a meaningful parameter, namely the distance between the user 
and the service. The distance-based MDP has several structural 
properties that allow us to develop an efficient algorithm to 
And its optimal policy. We have then shown that the distance- 
based MDP is a good approximation to scenarios where the 
users move in a 2-D space, which is confirmed by analytical 
and numerical evaluations and also by simulations with real- 
world traces of taxis in San Erancisco. 

Eor ease of presentation, in this paper we have assumed 
that MECs are colocated with basestations. However, our 


proposed approach is not restricted to such cases and can easily 
incorporate scenarios where MECs are not colocated with 
basestations as long as the costs are geographically dependent. 

The results in this paper provide an efficient solution to 
service migration in MECs. Eurther, we envision that the 
approaches used in this paper can be extended to a range of 
other problems that share similar properties. The highlights 
of our approaches include: a closed-form solution to the 
discounted sum cost of a particular class MDPs, which can be 
used to simplify the procedure of finding the optimal policy; 
a method to approximate an MDP (in a particular class) with 
one that has smaller state space, where the approximation error 
can be shown analytically; and a method to collect statistics 
from the real-world to serve as parameters of the MDP. 
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